Grammar-compressed indexes with logarithmic search time

نویسندگان

چکیده

Let a text T[1..n] be the only string generated by context-free grammar with g (terminal and nonterminal) symbols, of size G (measured as sum lengths right-hand sides rules). Such grammar, called grammar-compressed representation T, can encoded using Glg⁡G bits. We introduce first index that uses O(Glg⁡n) bits (precisely, Glg⁡n+(2+ϵ)Glg⁡g for any constant ϵ>0) find occ occurrences patterns P[1..m] in time O((m2+occ)lg⁡G). implement demonstrate its practicality comparison state art, on highly repetitive collections.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Grammar-Based Compressed Indexes

We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (context-free) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of T ...

متن کامل

Fast Compressed Self-Indexes with Deterministic Linear-Time Construction

We introduce a compressed suffix array representation that, on a text T of length n over an alphabet of size σ, can be built in O(n) deterministic time, within O(n log σ) bits of working space, and counts the number of occurrences of any pattern P in T in time O(|P |+log logw σ) on a RAM machine of w = Ω(logn)-bit words. This new index outperforms all the other compressed indexes that can be bu...

متن کامل

Compressed Inverted Indexes for In-Memory Search Engines

We present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text. The system uses a carefully choreographed combination of classical data compression techniques and inverted index based search data structures. It outperforms suffix array based techniques for all the above operations for real wo...

متن کامل

Compressed Text Indexes with Fast Locate

Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages, support indexed text searches. At this point those indexes are competitive with traditional text indexes (which are very large) for counting the number of occurrences of a pattern in the text. Yet, they...

متن کامل

Approximate String Matching with Compressed Indexes

A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T . It can also reproduce any substring of T , thus actually replacing T . Despite the recent explosion of interest on compressed indexes, there has not been much progress on functionalities beyond the basic exact search. In this paper we focus on indexed approximate s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Computer and System Sciences

سال: 2021

ISSN: ['1090-2724', '0022-0000']

DOI: https://doi.org/10.1016/j.jcss.2020.12.001